Antipattern: Use Null as an Ordinary Value, or Vice Versa

Let's see what happens when we treat the null value as an ordinary value or vice versa.

Many software developers are caught off-guard by the behavior of NULL in SQL. Unlike most programming languages, SQL treats NULL as a unique value that is distinct from zero, false, and empty strings. This is true in standard SQL and most brands of databases. However, in Oracle and Sybase, NULL is precisely the same as a string of zero length. The NULL value follows some unique behavior, too.

Using NULL value in expressions#

One case that surprises some people is when we perform arithmetic on a column or expression that is NULL. For example, many programmers would expect the result to be 10 for bugs that haven’t been given an estimate in the hours column, but instead, the query returns NULL.

Retrieving some null value by adding something in it returns null

NULL is not the same as zero. A number that is ten greater than an unknown number is still an unknown number.

NULL is not the same as a string of zero length. Combining any string with NULL in standard SQL returns NULL (despite its behavior in Oracle and Sybase).

NULL is not the same as false. Boolean expressions also produce results that some people find confusing.

Searching nullable columns#

The following query returns only those rows in which assigned_to has the value 123, and not rows with other values or in which the column is null:

Retrieving bugs assigned to 123

It may seem that the following query returns the complement set of rows to the previous query, that is, all those rows that are not returned by the previous query. However, this query doesn’t actually return any rows at all. Let’s try it in the following playground:

Retrieving bugs not assigned to 123

However, neither query result includes rows where assigned_to is NULL. Any comparison to null returns unknown, not true or false. The negation of NULL is still NULL.

It’s common to make the following mistakes when searching for NULL or non-NULL values:

Retrieving bugs assigned to NULL

Now, we will retrieve the bugs assigned to non-NULL in the following widget:

Retrieving bugs assigned to non-NULL

The condition in a WHERE clause is satisfied only when the expression is true, but a comparison to NULL is never true; it’s unknown. It doesn’t matter whether the comparison is for equality or inequality. Thus, neither of the previous queries returns rows where assigned_to is NULL.

Using NULL value in query parameters#

It’s also difficult to use NULL in a parameterized SQL expression as if NULL were an ordinary value.

Treating NULL as an ordinary value in the query parameter

The previous query returns predictable results when we send an ordinary integer value for the parameter, but we can’t use a literal NULL as the parameter.

Avoiding the issue#

If handling NULL makes queries more complex, many software developers choose to disallow NULL altogether in the database. Instead, they choose an ordinary value to signify “unknown” or “inapplicable.”

We hate NULLs!

What exactly is wrong with this practice? In the following example, let’s declare the previously nullable columns assigned_to and hours as NOT NULL:

Creating Bugs table by specifying NOT NULL for two columns

Let’s say we use -1 to represent an unknown value.

Trying to insert invalid data for NOT NULL columns

The hours column is numeric, so we’re restricted to a numeric value to mean “unspecified.” It has to have no meaning in that column, so we chose a negative value. But the value -1 would throw off calculations such as SUM() or AVG(). We have to exclude rows with this value, using special-case expressions, which is what we were trying to avoid by prohibiting null in the first place.

Retrieving records using aggregate function excluding records having -1 values

In another column, the value -1 might be significant, so we have to choose a different value on a case-by-case basis for each column. We also have to remember or document the special values used by each column. This adds a lot of arduous and unnecessary work to a project.

Now let’s look at the assigned_to column. It is a foreign key to the Accounts table. When a bug has been reported but not assigned yet, what non-NULL value can we use? Any non-NULL value must reference a row in Accounts, so we need to create a placeholder row in Accounts, meaning “no one“ or “unassigned.” It seems ironic to create an account to reference so that we can represent the absence of a reference to a real user’s account.

When we declare a column as NOT NULL, it should be because it would make no sense for the row to exist without a value in that column. For example, the Bugs.reported_by column must have a value because every bug was reported by someone. But a bug may exist without having been assigned yet. Missing values should be NULL.

Synopsis: Fear of the Unknown
Solution: Use NULL as a Unique Value
Mark as Completed
Report an Issue